Goto

Collaborating Authors

 security analyst


Democratizing ML for Enterprise Security: A Self-Sustained Attack Detection Framework

Momeni, Sadegh, Zhang, Ge, Huber, Birkett, Harkous, Hamza, Lipton, Sam, Seguin, Benoit, Pavlidis, Yanis

arXiv.org Artificial Intelligence

Abstract--Despite advancements in machine learning for security, rule-based detection remains prevalent in Security Operations Centers due to the resource intensiveness and skill gap associated with ML solutions. While traditional rule-based methods offer efficiency, their rigidity leads to high false positives or negatives and requires continuous manual maintenance. This paper proposes a novel, two-stage hybrid framework to democratize ML-based threat detection. The first stage employs intentionally loose Y ARA rules for coarse-grained filtering, optimized for high recall. T o overcome data scarcity, the system leverages Simula, a seedless synthetic data generation framework, enabling security analysts to create high-quality training datasets without extensive data science expertise or pre-labeled examples. A continuous feedback loop incorporates real-time investigation results to adaptively tune the ML model, preventing rule degradation. This proposed model with active learning has been rigorously tested for a prolonged time in a production environment spanning tens of thousands of systems. The system handles initial raw log volumes often reaching 250 billion events per day, significantly reducing them through filtering and ML inference to a handful of daily tickets for human investigation. Live experiments over an extended timeline demonstrate a general improvement in the model's precision over time due to the active learning feature. This approach offers a self-sustained, low-overhead, and low-maintenance solution, allowing security professionals to guide model learning as expert "teachers". Despite significant advancements in machine learning (ML) for security, traditional rule-based detection remains the predominant approach in enterprise security operations. This is evidenced by the low adoption rate of ML-based technologies in Security Operations Centers (SOC), with one study [1] finding that only 10% of participating SOCs utilized AI/ML security monitoring tools.


Systems-Theoretic and Data-Driven Security Analysis in ML-enabled Medical Devices

Mitra, Gargi, Hallajiyan, Mohammadreza, Kim, Inji, Dharmalingam, Athish Pranav, Elnawawy, Mohammed, Iqbal, Shahrear, Pattabiraman, Karthik, Alemzadeh, Homa

arXiv.org Artificial Intelligence

The integration of AI/ML into medical devices is rapidly transforming healthcare by enhancing diagnostic and treatment facilities. However, this advancement also introduces serious cybersecurity risks due to the use of complex and often opaque models, extensive interconnectivity, interoperability with third-party peripheral devices, Internet connectivity, and vulnerabilities in the underlying technologies. These factors contribute to a broad attack surface and make threat prevention, detection, and mitigation challenging. Given the highly safety-critical nature of these devices, a cyberattack on these devices can cause the ML models to mispredict, thereby posing significant safety risks to patients. Therefore, ensuring the security of these devices from the time of design is essential. This paper underscores the urgency of addressing the cybersecurity challenges in ML-enabled medical devices at the pre-market phase. We begin by analyzing publicly available data on device recalls and adverse events, and known vulnerabilities, to understand the threat landscape of AI/ML-enabled medical devices and their repercussions on patient safety. Building on this analysis, we introduce a suite of tools and techniques designed by us to assist security analysts in conducting comprehensive premarket risk assessments. Our work aims to empower manufacturers to embed cybersecurity as a core design principle in AI/ML-enabled medical devices, thereby making them safe for patients.


RAPID: Robust APT Detection and Investigation Using Context-Aware Deep Learning

Amaru, Yonatan, Wudali, Prasanna, Elovici, Yuval, Shabtai, Asaf

arXiv.org Artificial Intelligence

Advanced persistent threats (APTs) pose significant challenges for organizations, leading to data breaches, financial losses, and reputational damage. Existing provenance-based approaches for APT detection often struggle with high false positive rates, a lack of interpretability, and an inability to adapt to evolving system behavior. We introduce RAPID, a novel deep learning-based method for robust APT detection and investigation, leveraging context-aware anomaly detection and alert tracing. By utilizing self-supervised sequence learning and iteratively learned embeddings, our approach effectively adapts to dynamic system behavior. The use of provenance tracing both enriches the alerts and enhances the detection capabilities of our approach. Our extensive evaluation demonstrates RAPID's effectiveness and computational efficiency in real-world scenarios. In addition, RAPID achieves higher precision and recall than state-of-the-art methods, significantly reducing false positives. RAPID integrates contextual information and facilitates a smooth transition from detection to investigation, providing security teams with detailed insights to efficiently address APT threats.


AI Code Generators for Security: Friend or Foe?

Natella, Roberto, Liguori, Pietro, Improta, Cristina, Cukic, Bojan, Cotroneo, Domenico

arXiv.org Artificial Intelligence

Abstract--Recent advances of AI code generators are opening new opportunities in software security research, including misuse by malicious actors. We make the case that cybersecurity professionals need to leverage AI code generators. We review use cases for AI code generators for security, and introduce an evaluation benchmark for these tools. These models can automatically mitigate intrusions. Recent studies analyzed this technology in web and books, using highly scalable deep-learning the context of generating malware, malicious content architectures.


Detecting Anomalous Network Communication Patterns Using Graph Convolutional Networks

Vaisman, Yizhak, Katz, Gilad, Elovici, Yuval, Shabtai, Asaf

arXiv.org Artificial Intelligence

To protect an organizations' endpoints from sophisticated cyberattacks, advanced detection methods are required. In this research, we present GCNetOmaly-- a graph convolutional network (GCN)-based variational autoencoder (VAE) anomaly detector trained on data that include connection events among internal and external machines. As input, the proposed GCN-based VAE model receives two matrices: (i) the normalized adjacency matrix, which represents the connections among the machines, and (ii) the feature matrix, which includes various features (demographic, statistical, process-related, and Node2vec structural features) that are used to profile the individual nodes/machines. After training the model on data collected for a predefined time window, the model is applied on the same data; the reconstruction score obtained by the model for a given machine then serves as the machine's anomaly score. GCNetOmaly was evaluated on real, large-scale data logged by Carbon Black EDR from a large financial organization's automated teller machines (ATMs) as well as communication with Active Directory (AD) servers in two setups: unsupervised and supervised. The results of our evaluation demonstrate GCNetOmaly's effectiveness in detecting anomalous behavior of machines on unsupervised data.


Knowledge-enhanced Neuro-Symbolic AI for Cybersecurity and Privacy

Piplai, Aritran, Kotal, Anantaa, Mohseni, Seyedreza, Gaur, Manas, Mittal, Sudip, Joshi, Anupam

arXiv.org Artificial Intelligence

Neuro-Symbolic Artificial Intelligence (AI) is an emerging and quickly advancing field that combines the subsymbolic strengths of (deep) neural networks and explicit, symbolic knowledge contained in knowledge graphs to enhance explainability and safety in AI systems. This approach addresses a key criticism of current generation systems, namely their inability to generate human-understandable explanations for their outcomes and ensure safe behaviors, especially in scenarios with \textit{unknown unknowns} (e.g. cybersecurity, privacy). The integration of neural networks, which excel at exploring complex data spaces, and symbolic knowledge graphs, which represent domain knowledge, allows AI systems to reason, learn, and generalize in a manner understandable to experts. This article describes how applications in cybersecurity and privacy, two most demanding domains in terms of the need for AI to be explainable while being highly accurate in complex environments, can benefit from Neuro-Symbolic AI.


Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)

Sabir, Bushra, Babar, M. Ali, Abuadbba, Sharif

arXiv.org Artificial Intelligence

Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 have shown impressive performance in NLP. However, their vulnerability to adversarial examples poses a security risk. Existing defense methods lack interpretability, making it hard to understand adversarial classifications and identify model vulnerabilities. To address this, we propose the Interpretability and Transparency-Driven Detection and Transformation (IT-DT) framework. It focuses on interpretability and transparency in detecting and transforming textual adversarial examples. IT-DT utilizes techniques like attention maps, integrated gradients, and model feedback for interpretability during detection. This helps identify salient features and perturbed words contributing to adversarial classifications. In the transformation phase, IT-DT uses pre-trained embeddings and model feedback to generate optimal replacements for perturbed words. By finding suitable substitutions, we aim to convert adversarial examples into non-adversarial counterparts that align with the model's intended behavior while preserving the text's meaning. Transparency is emphasized through human expert involvement. Experts review and provide feedback on detection and transformation results, enhancing decision-making, especially in complex scenarios. The framework generates insights and threat intelligence empowering analysts to identify vulnerabilities and improve model robustness. Comprehensive experiments demonstrate the effectiveness of IT-DT in detecting and transforming adversarial examples. The approach enhances interpretability, provides transparency, and enables accurate identification and successful transformation of adversarial inputs. By combining technical analysis and human expertise, IT-DT significantly improves the resilience and trustworthiness of transformer-based text classifiers against adversarial attacks.


What is generative AI and its use cases? - Information Age

#artificialintelligence

Generative AI is the is a technological marvel destined to change the way we work, but what does it do and what are its use cases for CTOs? Where better to start than with a definition straight from the horse's mouth? ChatGPT-4, from Microsoft-backed OpenAI, describes generative AI as "…a type of artificial intelligence that involves training algorithms to generate new content based on a given set of parameters or data. This technology has been used in a variety of fields, from image and video generation to natural language processing and music creation". Apart from being endearingly modest, generative AI also has the potential to transform economies.


Exploring Optimal Deep Learning Models for Image-based Malware Variant Classification

Mitsuhashi, Rikima, Shinagawa, Takahiro

arXiv.org Artificial Intelligence

Analyzing a huge amount of malware is a major burden for security analysts. Since emerging malware is often a variant of existing malware, automatically classifying malware into known families greatly reduces a part of their burden. Image-based malware classification with deep learning is an attractive approach for its simplicity, versatility, and affinity with the latest technologies. However, the impact of differences in deep learning models and the degree of transfer learning on the classification accuracy of malware variants has not been fully studied. In this paper, we conducted an exhaustive survey of deep learning models using 24 ImageNet pre-trained models and five fine-tuning parameters, totaling 120 combinations, on two platforms. As a result, we found that the highest classification accuracy was obtained by fine-tuning one of the latest deep learning models with a relatively low degree of transfer learning, and we achieved the highest classification accuracy ever in cross-validation on the Malimg and Drebin datasets. We also confirmed that this trend holds true for the recent malware variants using the VirusTotal 2020 Windows and Android datasets. The experimental results suggest that it is effective to periodically explore optimal deep learning models with the latest models and malware datasets by gradually reducing the degree of transfer learning from half.


The Next Generation of Threat Detection Will Require Both Human and Machine Expertise

#artificialintelligence

There is a debate in the world of cybersecurity about whether to use human or machine expertise. However, this is a false dichotomy: Truly effective threat detection and response need both kinds of expertise working in tandem. It will be years before machines completely replace the humans who perform typical detection and response tasks. What we predict for the meantime is a symbiotic relationship between humans and machines. The combination means that detection of and response to threats can be faster and more intelligent.